Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase vector size limit to 4096 #680

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

sarthakn7
Copy link
Contributor

@sarthakn7 sarthakn7 commented Aug 5, 2024

RP-11388

Lucene has made the max vector dimensions a property of the codec. We override the function to set it to 4096.

There are some risks with bigger vectors - there is a lot of discussion in apache/lucene#11507. These are the most relevant points from the issue:

  1. Bigger vectors require lot more memory.
  2. Each added dimension makes the floating point errors larger and sometimes also returns NaN.
  3. Lucene's default limit is still 1024 - they can still make changes in the future keeping this in mind.

I thought about not having a limit to allow for easy experimentation, but the above risks are significant. I also considered keeping the limit and make it configurable, but I fear it would effectively be no limit - when needed users will increase it.
I think it's reasonable to still have a limit but increase it to 4096 (same as Elasticsearch). This should allow large enough vector sizes required for experimentation today and not have too much of a risk of performance/correctness/future changes.

@sarthakn7 sarthakn7 merged commit 35970d0 into 1.0.0-SNAPSHOT Aug 6, 2024
1 check passed
@sarthakn7 sarthakn7 deleted the sarthakn_increase_vector_size_limit branch August 6, 2024 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants